Daniel Simberloff’ 


Use of Rarefaction and Related 
Methods in Ecology 


REFERENCE: Simberloff, Daniel, “Use of Rarefaction and Related Methods in 
Ecology,” Biological Data in Water Pollution Assessment: Quantitative and Statistical 
Analyses, ASTM STP 652, K. L. Dickson, John Cairns, Jr., and R. J. Livingston, 
Eds., American Society for Testing and Materials, 1978, pp. 150-165. 


ABSTRACT: Rarefaction is a statistical technique useful in both poilution and 
evolutionary ecology. It can be used to infer whether samples are drawn from the 
same community and also to estimate minimum feasible sample size. In this setting 
it essentially tells the investigator what would have been found had sample size 
been smaller, but some surprising uses arise if questions are phrased properly. 
Rarefaction is most powerful in pollution ecology when entire curves, and not just 
single values, are calculated. In evolutionary ecology, species/genus and related 
ratios have been examined as an indicator of both competition and adaptive 
radiation, but rarefaction demonstrates that the former, at least, is rarely evidenced 
by such ratios. The ratios are largely determined by the number of species, and 
claimed relationships between the ratios and area are primarily artifacts of the high 
correlation between the ratios and species number. The analytic expressions for 
expectation and variance in rarefied samples and a FORTRAN IV program for 
their calculation are given. 


KEY WORDS: ecology, diversity, environments, hierarchical distribution, rarefac- 
tion, species-abundance curve 


Rarefaction is a statistical technique of broad utility in a variety of 
ecological contexts; its first formal exposition was by Sanders [/],° but a 
related method was used by Williams in 1964 [2]. Generally, rarefaction 
can be used whenever entities of one hierarchical level are classified into 
entities at a higher level; ecologists are most likely to be interested in 
individuals apportioned into species [/,3], but one might also be con- 
cerned with how species are apportioned into genera [4], species into 
families, genera into families, species into orders [5], or any other 
convenient hierarchical classification. 

The general question asked takes either of two forms, and the informa- 
tion which answers one form can always be used to answer the other. I will 


' Associate professor, Department of Biological Science, Florida State University, Talla- 
hassee, Fla. 32306. 


* The italic numbers in brackets refer to the list of references appended to this paper. 


150 


SIMBERLOFF ON USE OF RAREFACTION 151 


use the classification of individuals into species as an example of the first 
type of question. One might ask for a sample of J individuals into $ species 
how many species S$! (Si < S) would likely have been included had there 
been a smaller sample size Î < J; a rarefaction curve (Fig. 1) plots 
expected number of species versus number of individuals in a sample. Of 
course such a manipulation would be convenient if one wished to compare 
several samples of different sizes and the statistic S were of interest, but 
less obvious questions may also be asked in this form. One may wish to 
know, for example, whether two samples could be reasonably claimed to 
have been drawn from the same parent population. Rarefaction is a weak 
statistical technique since identities of the species are not used; two 
samples rarefied to the same number of individuals might have the same 
numbers of species, but none of the same species. Rarefaction would not 
detect a difference between them! But rarefaction is easily performed 
(methods described next) and frequently will allow immediate rejection of 
the null hypothesis that two samples have been drawn from the same 
population. Simberloff [6] discusses a succession of methods involving 
similarity indexes which become progressively more powerful statistically 
as more detailed use is made of species’ identities [7]. One might also 
wish to know, given some information about relative species abundances 
in a parent population, what size sample (in terms of number of individuals 


——— 


NUMBER OF SPECIES 


HWi}----------------------------\- + += = -- 


2 
NUMBER OF INDIVIDUALS 


FIG. 1—Three possible rarefaction curves for different communities. The crossing of 
curves aand b could lead to erroneous interpretation, as described in the text. 


152 BIOLOGICAL DATA IN WATER POLLUTION ASSESSMENT 


collected) would be likely to yield some specified percentage P of the 
species present; Heck et al [3] demonstrate how in one sampling program 
for benthic macroinvertebrates the nature of the species-abundance curve 
was such that in samples only 70 percent as large as those originally 
envisioned one could expect to find 90 percent of the species which would 
have been found originally. Whether this increase in cost-effectiveness 
overweighs the decrease in thoroughness depends, of course, on a project’s 
goals. In the above instance, reducing sample size allowed an increase in 
number of sampling stations. 

I have pointed out that since rarefaction looks only at numbers and not 
at identities, two samples from communities of very different organisms 
May appear to be similar if, when farefied to the same number of 
individuals, they have the same number of species. A subtler potential for 
error arises when one looks at the rarefaction curve for only one or a few 
values of the abscissa (numbers of individuals). This is because, in essence, 
an entire rarefaction curve depicts the complete arrangement of organisms 
in a two-level hierarchy and is, therefore, a complex function of both 
numbers of entities and the evenness of distribution of entities of the lower 
level (for example, individuals) into entities of the higher level (for 
example, species). One could conceive of a situtation in which two 
collections of the same entities (for example, species) were hierarchically 
distributed such that their rarefaction curves, though very different, 
crossed at one or more points (Fig. 1). If one compared only a sample 
from one collection to the entire rarefaction curve for the other collection 
at the abscissal value for one of these intersection points, he would be 
unable to deduce that the sample could not have been drawn from the 
collection which generated the complete rarefaction curve. In Fig. 1, the 
communities generating rarefaction curves (a) and (b) bear this relation- 
ship to one another, and it is clear that if one had a sample of J, or J, 
individuals from community (b) and compared it only to the rarefied value 
S, or Sy, respectively, from the curve of community (a) by looking at the 
ordinates corresponding to J, and J,, he would not recognize the peculiarity 
of his sample. However, if one generated the entire rarefaction curve 
(SI) for a sample of size I, from community (b) and compared it to the 
rarefaction curve for community (a), he would be unlikely to be misled. In 
short, for single-value rarefaction to compare communities or samples, 
either as a statistic to compare the richness component of diversity or as a 
method to see if samples are drawn from some particular community, one 
must assume that the rarefaction curves, like (a) and (c) in Fig. 1, have 
the same “shape” and do not cross. Fortunately, this usually seems to be 
the situation for properly computed rarefaction curves which have been 
published in their entirety [7,5]. 

Finally, rarefaction assumes a random spatial dispersion of individuals, 
but it is a biological commonplace that most organisms tend to be 


SIMBERLOFF ON USE OF RAREFACTION 153 


clumped. Fager [8] dealt with this problem, and though he did not use the 
mathematically correct procedure to rarefy, his result is certainly correct: 
the more clumped the populations in a community are, the more rarefac- 
tion overestimates the number of species expected in a sample taken in 
nature. The reason is that a real sample from an underdispersed commu- 
nity is likely to draw individuals from only a few clumps (species), whereas 
if the individuals were all mixed randomly, a sample would likely draw 
from many species. Clumping is less likely to be a problem for mobile 
organisms than for sessile ones, particularly when the sampler (for 
example, a plankton net) moves over a large area. Fish netted over a large 
section of a river are unlikely to reflect clumping. Generally, the larger the 
sample sizes to be compared by rarefaction, the less likely will it be that 
spatial underdispersion affects the results. In essence, a large sample is 
effectively stratified in that it tends to collect from several clumps instead 
of just one or two. 


Rarefaction in Pollution Ecology 


To exemplify rarefaction questions of the first form, I use data from the 
study by Cairns et al [9] on the recovery of pollution-damaged streams. 
Their Table 5 lists the numbers of individuals of fishes of different species 
collected at seven stations on the Roanoke River just after a major spill of 
ethyl benzene and creosote, and they used these data to analyze faunal 
effects of the spill. In particular, stations 6 and 7 (7.23 and 9.65 km below 
the spill, respectively) were felt to be similar to one another and to have 
diversities similar to the reference stations, especially station 2, above the 
spill site. Stations 3 and 4 (3.21 km below the spill) were felt not only to 
be depauperate but to be compositionally peculiar, with a superabundance 
of minnows. Table 1 lists the abundances of the 26 fish species at stations 
2, 4, 6, and 7. If these assertions are correct, then one ought to be able to 
pool the samples from stations 6 and 7 and by rarefaction show the 
following three things: 

1. If one draws the number of individuals in either sample 6 (628) or 
sample 7 (505) randomly from the pooled total, the expected number of 
species ought to be close to that actually observed; this is a test of the 
homogeneity of the relative abundance distributions samples 6 and 7. 

2. If one draws the number of individuals in sample 2 (429) randomly 
from the pooled 6 and 7, the expected number of species ought to be close 
to that observed (20). 

3. If one draws the number of individuals in sample 4 (118) randomly 
from the pooled 6 and 7, the expected number of species ought not to be 
close to that observed [8]. 

Figure 2 is a complete rarefaction curve for the pooled sample of 1133 
individuals achieved by rarefying at intervals of 20 individuals; the curve is 


154 BIOLOGICAL DATA IN WATER POLLUTION ASSESSMENT 


TABLE 1 —Distribution of fish individuals into species at selected stations on the Roanoke 


River (data from Ref 9). 
Stations 
Species # 
2 4 6 7 6+7 
1 20 2 11 14 25 
2 10 2 14 7 21 
3 1 0 0 0 0 
4 12 1 1 18 19 
5 8 0 6 6 12 
6 4 0 1 2 3 
7 1 0 0 0 0 
8 15 0 4 0 4 
9 13 4 21 20 41 
10 8 0 10 35 45 
11 69 0 54 42 96 
12 151 15 144 142 286 
13 77 47 64 29 93 
14 8 46 50 76 126 
[s 1 0 7 1 8 
16 0 0 35 3 38 
17 1 0 0 0 0 
18 24 1 8 16 24 
19 0 0 2 0 2 
20 2 0 0 0 0 
21 2 0 6 0 6 
22 2 0 0 1 1 
23 0 0 1 2 3 
24 0 0 88 61 149 
25 0 0 66 12 78 
26 0 0 34 18 $2 
27 0 0 1 0 1 
Number of individuals 429 118 628 505 1133 
Number of species 20 8 22 19 23 


bracketed by a band 1.96 standard deviations wide, which constitutes 
approximate 95 percent confidence limits. True exact confidence limits 
are not symmetric, but are close to symmetric except at the ends of the 
curve, and are laboriously calculated. For the numbers of individuals in 
samples 2, 4, 6, and 7 (429, 118, 628, and 505, respectively) the observed 
number of species is also indicated. It can be seen that all the predictions 
are met except that sample 7 falls slightly outside the 95 percent confi- 
dence band (but inside a 99 percent limit). Doubtless a similarity index 
analysis would provide even stronger support for Cairns et al’s claims 
(observe the general similarity of presences and abundances for samples 6 
and 7, for example) but is beyond the scope of this paper. In any event, 
with the simple technique of rarefaction we have given strong support to 
the original contention. Finally, Cairns et al’s assertion of compositional 
pecularity of station 4 can also be tested precisely by a method very similar 
to rarefaction involving enumeration of the species present in specific 
randomly drawn samples. Simberloff [70] describes the use of this 
technique to criticize the conclusion of MacArthur et al [ZI] that the 


SIMBERLOFF ON USE OF RAREFACTION 155 


NUMBER OF SPECIES 


100 200 300 400 500 600 700 800 900 1000 |J00 1,200 


NUMBER OF INDIVIDUALS 


FIG. 2—Complete rarefaction curves for fish communities of the Roanoke River. Numbers 
refer to the collection stations described in the test. Data from Ref 9. 


absence of certain large families of birds from the Pearl Islands is 
inexplicable if the species were but a random subset of the Panamanian 
species pool. 


Rarefaction in Evolutionary Ecology 


The second form of question which can be answered by rarefaction 
concerns “taxonomic diversity,” or mean number of species per genus in 
some biota [4]; of course an analogous statement may be made about 
mean number of genera per family, etc. This is a widely misunderstood 
statistic; it is not “a synonym or near-synonym for richness in number of 
species,” as it is described by Carlquist [72], though the two statistics are 
highly correlated (a point which is also misunderstood). The sense in 
which a ratio like mean number of species per genus or mean number of 
individuals per species represents some form of diversity is clear: a high 
ratio of species per genus means that in a community, most of the species 
belong to relatively few genera, or are in some respects similar to one 
another. An example from the water pollution literature is eutrophication 
by sewage, during which numbers of species may decrease, but numbers 
of individuals of a few species (for example, certain blue-green algae and 
diatoms) increase enormously. The ratio of individuals-to-species becomes 
high, and diversity is low in the sense that most individuals are of a few 
species. A similarly low diversity is manifested at the species-to-division 


156 BIOLOGICAL DATA IN WATER POLLUTION ASSESSMENT 


level, since all of the species are likely to be in only two divisions: 
Cyanochloronta and Chrysophycophyta. 

One can observe immediately that expected taxonomic diversity is 
calculated directly by dividing the observed number of species by the 
expected number of genera, provided by rarefaction. Confidence limits 
can be determined by dividing the observed number of species by the 
upper and lower confidence limits which rarefaction gives for number of 
genera. Elton’s suggestion [73] that the mean number of species per genus 
(S/G ratio) could be used to infer the presence of competition particularly 
fostered interest in this statistic. He reasoned that if competition were 
important, one would expect it to act most strongly between congeneric 
species which, on the average, should be more similar ecologically than a 
randomly drawn pair of species. Consequently, there would be a tendency 
for species to exclude others in their genus, and this would lower S/G. 
Williams [2] summarizes other early work in this vein, and appears to 
have been the first to have observed that, since a random sample from a 
parent population will always have an expected S/G lower than that of the 
parent population itself, one cannot simply state that one biota with lower 
S/G than another biota’s is either undergoing less competition in the 
present or has the low S/G because of more intense competition in the 
past. A correction must be made for species number. 

Though Williams buttressed his contention with a few random drawings 
of chips representing species with numbers on them representing genera, 
the attribution of competitive significance to lower S/G per se persisted 
[74,15]. In a summary of this problem [4] I used a computer simulation 
to draw randomly 100 times a collection of S species from some appropri- 
ate species pool and used the mean of these 100 values as the expected 
S/G, testing for significance of deviation from the observed values of a 
number of known biotas with a Student’s 2-tailed t-test. The specific 
conclusion was that, contrary to Grant’s assertion [/4], there tend to be 
slightly more congeners on an island with S species (higher S/G) than 
would be expected for a random sample of S species from the mainland 
species pool. Thus, even if competition on islands is more severe than on 
mainland, one certainly cannot demonstrate this by comparing S/G ratios. 
The calculation of a correct analytic rarefaction method with confidence 
limits [3] obviates the necessity for the simulation, though the result still 
stands. 

Despite my paper, a number of authors still confer significance on S/G 
ratios independent of the number of species. Johnston [16], analyzing the 
Cayman Island birds, cites my 1970 paper as suggesting that “similar $/G 
values for large and small islands would indicate . . . that competition is at 
least as intense on the small islands.” Of course my discussion in the 
preceding paragraph implies that S/G ratios are a poor way to attempt to 
infer competition but that, if anything, equal values of S/G on small and 


SIMBERLOFF ON USE OF RAREFACTION 157 


large islands (presumably with smaller and larger numbers of species, 
respectively) would indicate less intense competition on the small islands, 
or else how could more congeners exist than would have been expected? 
Johnston proceeds to observe that the data of Table 2, columns 1 and 2, 
indicate equal intensity of competition on the Caymans and Cuba. But if 
one uses rarefaction to draw randomly from the Cuban avifauna subsets of 
species equal in species numbers to the Cayman islands, he generates the 
last three columns to Table 2. The uniformly and significantly higher 
observed S/G than expected indicates that if the Cuban and Caymanian 
birds are at all comparable, S/G ratios indicate rather less competition on 
the Caymans than on Cuba. 

Another incorrect use of $/G and related ratios also rests on a failure to 
recognize how heavily dependent are the expected values of these ratios 
on S, the number of species. Cook [17], examining variation in species 
richness of birds, constructed two maps of North America, one (his Fig. 1) 
with isopleths of species numbers, the other (his Fig. 6) with isopleths of 
species-to-family (S/F) ratios. Observing that “in many ways this map [his 
Fig. 6] is similar to the species density map [his Fig. 1],’’ Cook nonetheless 
proceeds to attribute ecological significance (latitudinal gradients and 
topographic and peninsula effects) to the S/F map. In particular, an 
increase in S/F is thought to reflect an increase in vertical structure and 
ecological complexity, thus allowing related species to coexist more easily 
by reducing competition. The highest ratio (7.5 species/family) occurs in 
the Mexican border region and is said to “reflect the increasing environ- 
mental complexity in these regions, allowing the coexistence of more 
species with generally similar ecological requirements.” But the high 
correlation of the isopleths of the two maps is only a reflection of the 
dependence of S/F on S and any further significance, including the 
possibility of a relationship to competition intensity, remains unproven. 
That the Mexican border region has the highest S/F ratio is an artifact of 
its also having the greatest number of species. Balgooy [/8] commits the 
same error when comparing the flora of 75 islands and mainland regions. 
His Fig. 2 plots number of genera against log area, while his Fig. 4 plots 
the genus-to-family (G/F) ratio against log area. He observes that the two 
plots “show strong resemblance,” but nonetheless asserts that G/F in- 
creases with area. This increase is simply a passive consequence of the 


TABLE 2—Rarefaction values for the birds of Cuba and the Cayman Islands (data from 
Refs 27 and 16). 


Island SIG E(S/G) Standard See s 

tion 
Little Cayman 1.25 1.07 0.05 29 
Cayman Brac 1:27 1.08 0.05 30 
Grand Cayman 1.18 1.10 0.05 39 
Cuba 1.26 93 


158 BIOLOGICAL DATA IN WATER POLLUTION ASSESSMENT 


high correlation between area and number of genera G. A stepwise 
multiple regression (SPSS, Version 6.5) with Balgooy’s data shows a 
multiple R? of 0.904 (P < 0.001) when G/F is regressed only on G, while 
addition of area as a second independent variable raised R? only to 0.912 
(p < 0.001). The increase is significant at the 0.013 level, but it is clear 
from my discussion that the correlation of G/F with area (0.483) primarily 
reflects the correlation of G/F with G (0.951) and G with area (0.585). A 
similar treatment of the relationship between area and S/G for land birds 
and vascular plants of a number of island groups is presented in my 1970 
paper. Table 4 of that paper demonstrates that $ accounts for almost all 
the variation in S/G, and addition of area as an independent variable 
rarely effects a significant increase in R”. 

Let us consider one further misunderstanding concerning S/G and 
related ratios. Regarding the conclusions of my 1970 paper, Carlquist [/2] 
states that “they clearly do not apply to the Hawaiian islands,” and “may 
or may not apply to other situations.” From his Table 3.1, I have extracted 
the relevant data on island angiosperms which Carlquist amassed; they are 
reproduced in Table 3. My main conclusion in 1970 had been that S/G for 
an island was usually approximately what would be expected (or slightly 
greater) given the number of species on the island; there is no evidence in 
Table 3 to argue against this assertion, and without a species list from a 
well-delineated species pool no test can be performed. The subsidiary 
conclusion, that S/G (and S/F) are primarily functions of S, is well 
supported by these data, and this militates against Carlquist’s interpreta- 
tion of these figures as relevant to questions of speciation rate and adaptive 
radiation. Spearman’s rank correlation, rs, is 0.858 (P < 0.005) between 
S and S/G and 0.973 (P < 0.005) between S and S/F, suggesting that the 
ratios are primarily passive consequences of species number. In particular, 
the Hawaiian Islands have by far the highest ratios and also the highest 
number of species. 

Since the plants of Table 3 were clearly drawn from different species 
pools one might suggest that the slight deviations from perfect rank 


TABLE 3—Numbers of angiosperm species plus species-to-genus and species-to- family 
ratios for several islands and island groups (data from Ref 12). 


Island Number of Species SIG S/F 
St. Helena 45 1.3 2.5 
Annobon 115 1.2 3.0 
Juan Fernandez 146 1.6 3.4 
Marquesas 151 1.4 2.6 
Principe 276 1.3 4.2 
Galapagos 386 2.0 5.8 
Sao Tomé 556 1.6 5.9 
Mauritius 705 2.4 6.8 
Fernando Po 826 1.8 8.3 
Canary 826 2.1 10.6 
Hawaiian 1200 5.6 14.5 


SIMBERLOFF ON USE OF RAREFACTION 159 


correlation between S and S/G, and S and S/F, respectively, are attributa- 
ble to this source. For the four Gulf of Guinea islands in Table 3— 
Annobon, Principe, Sa6 Tomé, and Fernando Po—the rankings for both 
ratios are identical to those for species numbers. These islands probably 
have a common species pool, the plants of West Equatorial Africa, but I 
can find no single listing of the species either on the islands or in the pool. 
However, Lems [79] provides an annotated species list of Canary Island 
angiosperms, which I will use both as an example of rarefaction to examine 
ratios and as a demonstration that my 1970 conclusions are universal when 
the appropriate data can be examined. 

I have assumed that all species occurring on any of the seven islands are 
available as colonists, and so have used the pooled flora for all islands to 
represent the species pool. Preferable would have been a separate list of 
the plants of nearby mainland Africa, but in the absence of an independent 
pool one may lump together the biotas of the individual entities of interest 
(in this instance, islands) so long as one has reason to believe that most 
elements (in this instance, plant species) are, in fact, available as potential 
colonists of all the entities. This procedure is analogous to lumping 
together samples 6 and 7 in the examination of Roanoke River fish 
distributions (Table 1). The statistics on Canary Island angiosperms are 
presented in Table 4. Spearman’s rank correlation, rs, is 0.714 (0.05 < P 
< 0.10) for S and S/G, and 0.893 (P = 0.01) for both S and S/F, and G 
and G/F, implying that $/G and S/F ratios are primarily reflections of S, 
and G/F is largely determined by G. Actually, all of the deviations from 
perfect rank order correlation arise from the two islands with fewest 
species and genera, Fuerteventura and Lanzarote; these two form a 
distinct subset, being close to one another, and lower, more arid, and 
much nearer the coast than the other islands. It is likely in retrospect that 
they do not draw their colonists from the same species pool as the other 
islands. In any event, even though the Canary Islands are apparently not 
best treated as a homogeneous grouping in this sense, it is clear (Fig. 3) 
that all the observed ratios are rather close to the expected values, though 
most of the differences are statistically significant. 

One ought not to be left with the feeling that the tale of rarefaction in 
evolutionary ecology is one only of misuse and disuse. For example, Raup 
[5] has rarefied effectively several large samples of echinoids to show that 
the observed increase in the number of echinoid families, since the 
Paleozoic is real and cannot be explained solely by the increase in numbers 
of preserved species. Further, he demonstrated that the increase occurred 
primarily before the midCretaceous. Raup [20] used related reasoning to 
examine the probabilities of finding all taxa of different taxonomic levels 
(genera, families, orders, phyla) in a sample with a given number of 
species. This is equivalent to seeking the point at which a rarefaction curve 
becomes asymptotic, as Heck et al did to determine the necessary sample 


TABLE 4—Species, genus, and family numbers and rarefaction ratios for Canary Islands plants (data from Ref 19). 


Island Pool Lanzarote Fuerteventura Gran Canaria Tenerife Gomera Palma Hierro 
Species 1417 356 340 729 1024 492 539 376 


Genera 521 205 198 

Families 106 55 55 
S/G 2.720 1.737 1.717 
E(S/G) 1.639(0.052) 1.617(0.052) 
S/F 13.368 6.473 6.182 
E(S/F) 5.425(0.293)  5.278(0.290) 
G/F 4.915 3.727 3,600 
E(G/F) 3.065(0.166) 3.017(0.166) 


Nore —Parenthetic values are standard deviations. 


380 
90 
1.918 
2.078(0.046) 


8.100 
8.506(0.321) 


4.222 
4.151(0.135) 


454 

101 
2.256 
2.372(0.038) 


10.139 
10.681(0.294) 


4.495 
4,562(0.101) 


1.697 
1.812(0.050) 


6.560 
6.611(0.311) 


3.867 
3.617(0.157) 


308 
78 
1.750 
1.868(0.050) 


6.910 
7.001(0.315) 


3.949 
3.728(0.154) 


236 
67 
1.593 
1.666(0,052) 


5.612 
5.606(0.296) 


3.522 
3.273(0.164) 


091 


INAWSSASSV NOILNT10d Y3LYM NI VLVG Ivolbo701g 


SIMBERLOFF ON USE OF RAREFACTION 161 


RATIO 


200 400 600 800 1000 1200 1400 
NUMBER OF SPECIES OR GENERA 


FIG. 3—S/G (open circles), S/F (closed circles), and G/F (triangles) ratios for the Canary 
Islands. Expected values are indicuted by crosses, and the vertical lines represent 95 percent 
confidence limits. S = Species pool, T = Tenerife, C = Gran Canaria, P = Palma, G = 
Gomera, H = Hierro, L = Lanzarote, F = Fuerteventura. Data from Ref 19. 


size (in number of individuals) required to find, with known probability, 
some predetermined fraction of all the species present. But certainly the 
predominant picture provided by the foregoing examples portrays a 
moderately powerful technique which has so far largely been neglected 
and which is potentially confusing. 


Methods 


Sanders [7] published an algorithm for generating rarefaction curves, 
but Hurlbert [27], Fager [8], and Simberloff [22] all demonstrated that 
this algorithm consistently overestimated the true number of species (5;) 
which would be expected in a sample of 7 individuals drawn randomly 
from a larger collection. My paper did this by repeated computer-simu- 
lated drawings, introducing space-age technology into Williams’ [2] num- 
bered-chip method; confidence intervals were calculated from the repeated 
draws. Hurlbert [2/] derived an analytic expression for the expected value 
of S,, hereafter denoted E(S,) 


S 
Sae I, — J; 
dls (‘) >( I ) a) 
where 


S = total number of species in the collection, 
I, = total number of individuals in the collection, and 


162 BIOLOGICAL DATA IN WATER POLLUTION ASSESSMENT 


J; = number of individuals in species j in the collection. 
Heck et al [3] provided the variance 


so (P RED- 


S 
R- h= i 
+2 c j ) - 
ze] I 


i<k ` I 


Fortunately, both this formidable expression and the expected value, 
E(S;), are readily calculated by a computer program, reproduced in the 
Appendix. On the Florida State University CDC Cyber 73 Computer, the 
program generated the data for Fig. 2 in about 10 s and the data for Fig. 3 
in about 200 s. It is critically important to calculate the variance; for 
example, Abele and Walters [23] have shown that when Sanders’ [7] 
rarefaction curves are correctly calculated, the confidence limits are wide 
enough so that the differences between many pairs of them, on which 
Sanders based his widely cited “stability-time” hypothesis of marine 
benthic diversity, are not statistically significant. 

Despite the rather prompt demonstration that the original rarefaction 
algorithm was incorrect, it is still used to generate rarefaction curves [24- 
26]. The program is published in this symposium volume in the hopes of 
fostering correct and more frequent use of this technique. In addition to 
the ecological, geological, and evolutionary uses suggested and docu- 
mented here, one can conceive of applications of rarefaction to anthropol- 
ogy,” linguistics (it seems an ideal method for comparing vocabulary size, 
though it has not been so used to my knowledge), and probably other 
disciplines as well. A listing of the program is in the Appendix. 


Acknowledgments 
I thank the organizers of this symposium, especially Dr. R. J. Living- 


ston, for encouraging me to participate, and the Florida State University 
Computing Center for computing funds. 


* Pohl, M., Department of Anthropology, Florida State University, personal communica- 
tion. 


SIMBERLOFF ON USE OF RAREFACTION 163 


APPENDIX 


Rarefaction Program SIM 


Following is a reproduction of the rarefaction program SIM. The labels are for 
individuals and species, but the program can as well be used for any hierarchical 
classification. The language is FORTRAN IV (CDC), and the arrays are dimen- 
sioned so that maximum number of individuals in any one species is 6000, while 
maximum number of species is 550. 


PROGRAM SIM (INPUT,OUTPUT, TAPES=INPUT,TAPE6=OUTPUT) 
N=NUMBER OF INDIVIDUALS IN LARGEST SAMPLE 
NS=NUMBER OF SPECIES IN LARGEST SAMPLE 
NNU)=NUMBER OF INDIVIDUALS IN JTH MOST FREQUENT SPECIES IN 
LARGEST SAMPLE 
NUM=NUMBER OF SAMPLES TO BE DRAWN RANDOMLY FROM LARGEST SAMPLE 
INTS(J)=NUMBER OF INDIVIDUALS IN JTH RANDOMLY DRAWN SAMPLE (IN 
INCREASING ORDER) 

100 FORMAT(1H) 

101 FORMAT(2X,1HN,7X,4HE(S),6X 4HS.D.,6X,8HVARIANCE) 

102 FORMAT(15X,1HN,9X,2HNS) 

103 FORMAT(10X,*NUMBERS OF INDIVIDUALS IN THE DIFFERENT SPECIES") 
DIMENSION AFAC(6000),NN(550),T(550) ,P(S50) INTS(550) 
DO 10 T=1,6000 
AFAC(1)=0 

10 AFAC(I+1)=AFAC(1)+ALOG(FLOAT(I+1)) 

WRITE(6,4) 

FORMAT(*1") 

READ(5,2)N.NS,NUM 

WRITE(6,102) 

WRITE(6,5)N.NS 

FORMAT(10X,21 10) 

WRITE(6,100) 

READ(5,2)(NN(J)J=1.NS) 

WRITE(6,103) 

WRITE(6,6)(NN(J).J=1,NS) 

FORMAT(10X,1615) 

FORMAT(1615) 

WRITE(6,100) 

WRITE(6,101) 

READ(5,2)(INTS(J),J=1,NUM) 

DO 12 L3=1,.NUM 

M=INTS(L3) 

VAR=0. 

SM=0. 

DO 11 I=1,NS 

NNI=NN(I) 

IF(N-NNI-M.LE.0) GO TO 11 

P(1)= AFAC(N-NNI) - AFAC(N-NNI-M)+AFAC(N-M)—AFAC(N) 

P(I)=EXP(P(1)) 

SM=SM+P(I) 

VAR=VAR¢+ P(I)*(1-P(1)) 

11 CONTINUE 

DO 13 1=1,NS 

IF(NS-L.LE.O) GO TO 13 

K=I+1 

DO 13 J=K,.NS 

NNI=NN(1) 

NNJ=NN(J) 

PU=0. 

IF(N-NNI-NNJ-M.LE.O) GO TO 13 

PI) -AFAC(N-NNI-NNJ)—AFAC(N-NNI-NNJ-M)+ AFAC(N—-M)—AFAC(N) 


QNAANANA 


+ 


un 


> 


w 


164 BIOLOGICAL DATA IN WATER POLLUTION ASSESSMENT 


PIJ=EXP(PIJ) 
VAR=VAR+2*(PIJ—P(I)*P(J)) 
13 CONTINUE 
ANS=NS 
SD=SQRT(VAR) 
SM=ANS-SM 
WRITE(6,3)M,SM,SD,VAR 
3 FORMAT(IS,2X,3F10.4) 
12 CONTINUE 
STOP 
END 


Sample Output for Rarefaction Program SIM 


Following is a sample output for rarefaction program SIM run on the birds of 
Cuba, with the individuals-species classification representing species-genera. 


N NS 
93 74 


NUMBERS OF INDIVIDUALS IN THE DIFFERENT SPECIES 


4 3 3 2222232 au 2 z32 2793 
Le2ddthikthigdiawrst i 7 
Td @Buidtdhpnet ptt tia 
tf 4 PA Rae etPriaddpi®7 
t € © DBR LG TY 
'N E(S) S.D. VARI- 
ANCE 
29 26.9966 1.2303 1.5137 
30 27.8568 1.2638 1.5973 
39 35.4006 1.5287 2.3370 
References 


[7] Sanders, H. L., American Naturalist, Vol. 102, 1968, pp. 243-282. 

[2] Williams, C. B., Patterns in the Balance of Nature and Related Problems in Quantitative 
Ecology, Academic Press, New York, 1964. 

[3] Heck, K. L., Jr., van Belle, G., and Simberloff, D. S., Ecology, Vol. 56, 1975, pp. 
1459-1461. 

[4] Simberloff, D. S., Evolution, Vol. 24, 1970, pp. 23-47. 

[5] Raup, D. M., Paleobiology, Vol. 1, 1975, pp. 333-342. 

[6] Simberloff, D. S., American Naturalist, Vol. 112, 1977. 

[7] Grassle, J. F. and Smith, W., Oecologia, Vol. 25, 1976, pp. 13-22. 

[8] Fager, E. W., American Naturalist, Vol. 106, 1972, pp. 293-310. 

[9] Cairns, J., Crossman, J. S., Dickson, K. L., and Herricks, E. E., Bulletin, Association 
of Southeastern Biologists, Vol. 18, 1971, pp. 79-106. 

[10] Simberloff, D. S., Annual Review of Ecology and Systematics, Vol. 5, 1974, pp. 161- 

182. 


[77] MacArthur, R. H., Diamond, J. M., and Karr, J., Ecology, Vol. 53, 1972, pp. 330- 
342. 

[72] Carlquist, S., Island Biology, Columbia University Press, New York, 1974. 

[13] Elton, C. S., Journal of Animal Ecology, Vol. 15, 1946, pp. 54-68. 

[74] Grant, P. R., American Naturalist, Vol. 100, 1966, pp. 451-462. 

[75] Moreau, R. E., The Bird Faunas of Africa and Its Islands, Academic Press, London, 
1966. 

[16] Johnston, D., Bulletin Florida State Museum, Biological Science, Vol. 19, 1975, pp. 
235-300. 


SIMBERLOFF ON USE OF RAREFACTION 165 


[17] Cook, R. E., Systematic Zoology, Vol. 18, 1969, pp. 63-84. 

[78] Balgooy, M. M. J. van, Blumea, Vol. 17, 1969, pp. 139-178. 

[19] Lems, C., Sarracenia, Vol. 5, 1960, pp. 1-94. 

[20] Raup, D. M., Science, Vol. 177, 1972, pp. 1065-1071. 

[27] Hurlbert, S. H., Ecology, Vol. 52, 1971, pp. 577-586. 

[22] Simberloff, D. S., American Naturalist, Vol. 106, 1972, pp. 414-418. 

[23] Abele, L. G. and Walters, K., «Marine Benthic Diversity: a Critique and Alternative 
Explanation,” Journal of Biogeography, in press. 

[24] Stanton, R. J., Jr., and Evans, I., Journal of Paleontology, Vol. 46, 1972, pp. 845- 
858. 

[25] Calef, C. E. and Hancock, N. J., Paleontology, Vol. 17, 1974, pp- 779-810. 

[26] Duff, K. L., Paleontology, Vol. 18, 1975, pp. 443-482. 

[27] Bond, J., Check-List of Birds of the West Indies, 3rd edition, Academy of Natural 
Sciences, Philadephia, 1950. 


